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Abstract 

We consider the situation in which a transmitter attempts to communicate reliably over a discrete memoryless 
channel while simultaneously ensuring covertness (low probability of detection) with respect to a warden, who 
observes the signals through another discrete memoryless channel. We develop a coding scheme based on the 
principle of channel resolvability, which generalizes and extends prior work in several directions. First, it shows that, 
irrespective of the quality of the channels, it is possible to communicate on the order of ^/n reliable and covert bits 
over n channel uses if the transmitter and the receiver share on the order of ^/n key bits; this improves upon earlier 
results requiring on the order of y^logn key bits. Second, it proves that, if the receiver’s channel is “better” than 
the warden’s channel in a sense that we make precise, it is possible to communicate on the order of reliable and 
covert bits over n channel uses without a secret key; this generalizes earlier results established for binary symmetric 
channels. We also identify the fundamental limits of covert and secret communications in terms of the optimal 
asymptotic scaling of the message size and key size, and we extend the analysis to Gaussian channels. The main 
technical problem that we address is how to develop concentration inequalities for “low-weight” sequences; the crux 
of our approach is to dehne suitably modihed typical sets that are amenable to concentration inequalities. 


I. Introduction 

The benefits offered by ubiquitous communication networks are now mitigated by the relative ease with which 
malicious users can interfere or tamper with sensitive data. The past decade has thus witnessed a growing concern for 
the issues of privacy, confidentiality, and integrity of communications. In many instances, users in a communication 
network find themselves in a position in which they wish to communicate without being detected by others. Such 
situations include fairly innocuous scenarios of dynamic spectrum access in wireless channels, in which secondary 
users attempt to communicate without being detected by primary users. A perhaps more adversarial example is a 
situation in which a user wishes to convey information covertly, either to maintain his privacy, avoid attacks, or 
escape the attention of regulatory entities monitoring the network. 

Motivated by these challenges, |J2|, lO have established the first characterization of the throughput at which two 
users may communicate reliably over a noisy channel while guaranteeing a low probability of detection from a 
warden, who observes the transmitted signal through another noisy channel. Specifically, it has been shown that 
arbitrarily low probability of detection over pure loss quantum channels, thermal noise quantum channels, and 
classical Gaussian channels, is possible as long as one communicates at most on the order of ^/n bits over n 
uses of the channel; this scaling result has recently been refined to establish the optimal asymptotic throughput of 
covert and reliable communication 0, llS|. One notable characteristic of the covert communication scheme in {2}, 
which we revisit in the present paper, is to require a secret key between the legitimate users with size on the order 
ofy^ log n. These fundamental limits on covert communication may be viewed as the counterparts of the “square 
root law” of steganography ||6|| when the message is embedded in a covertext with zero mean. The results of O, 
Ol have been further extended in several directions, in particular by showing that arbitrarily small probability of 
detection is possible without secret-key when all users are connected by Binary Symmetric Channels (BSCs) and 
provided the warden’s BSC noise is much larger than legitimate users’ BSC noise || 71 ; this result was also extended 
to include secrecy constraints [¥]. Other extensions have attempted to identify scenarios in which the “square root 
law” may be beaten, which includes situations in which the channel statistics are imperfectly known |f9l, lITOl . ifTTl . 
lIT^ . or when the warden has uncertainty about the time of communication ifTSll . llT4ll . The ideas underlying the 
keyless coding scheme are also connected to those developed for “stealth” and channel resolvability in the context 
of wiretap channels |[T5l . lITfill . Tutorial presentations and discussions of these results may be found in ifTTl . ifTSl . 
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In the remainder of the paper, we use the terminology “covert communication” as a synonym for low-probability 
of detection |[T^ . ll2]| . ll^ . deniability Q, I®, and undetectable communication ifTOll . ifTTIl . since all terms refer 
to the same definition. The main conceptual contribution of the present work is to revisit the problem of covert 
communication from the perspective of resolvability EOl . ||21- This conceptual connection allows us to establish 
the following technical results that extend earlier work. 

• We revisit the coding scheme of l|2l that shows that on the order of ^/n reliable and covert bits may be 
communicated over n channel uses with on the order of ^/n log n bits of secret key in a universal manner; this 
is essentially a variation |[2l with a technical refinement (Theorem [T] Corollary [T]). 

• We develop an alternative coding scheme such that, if the warden’s channel statistics are known, on the order 
of ^/n reliable covert bits may be communicated over n channel uses with only on the order of ^/n bits of 
secret key. In addition, if the legitimate user’s channel is “better” than the warden’s channel, in a sense that 
is made precise in Section we show that no secret key is needed; in particular, this generalizes iHl to all 
Discrete Memoryless Channels (DMCs) (Theorem Corollary |^. 

• We show that both the key size and the message size in our scheme are asymptotically optimal for DMCs by 
adapting and extending the recent converse results of Wang et al. [4J, lO (Theorem]^ TheoremCorollary [^. 

• We extend the proposed covert communication scheme to include secrecy constraints (Theorem [^. 

• We partially extend the results to continuous channels, and in particular to Additive White Gaussian Noise 
(AWGN) channels (Theorem [^. 


The underlying technical problem that we solve is how to develop random coding arguments for “low-weight” 
codewords, in a sense that is precisely defined in Section III-B[ for which naive concentration inequalities, such as 
Hoeffding’s inequality, do not seem to apply. The crux of our approach is to define modified “typical sets” that are 
amenable to concentration inequalities, which was inspired by an astute technique in fj] to “concentrate” the sum 
of n independent and identically distributed (i.i.d.) random variables over a sum of ^/n terms. 

The paper is organized as follows. Section [In] formally introduces the problem of covert communication, sets the 
notation, and establishes a few preliminary results that justify the proposed conceptual approach. Section revisits 
the covert communication scheme of @ from the perspective of source resolvability, while Section |V] develops an 
alternative scheme using channel resolvability that turns out to be optimal. Section |Vl] develops the converse proof 
required to justify the optimality of the proposed scheme. Section VII presents several applications and extensions 
of the results, including Gaussian channels. 


II. Notation 

We briefly introduce the notation used throughout the paper. Random variables and denoted by upper case letters, 
e.g., X, while their realizations are denoted by lowercase, e.g, x. Vectors are denoted by boldface fonts, e.g., X 
and X. When the length of the vector is not included as an exponent, it is implicitly assumed that vectors are of 
length n G N*, i.e., X = (Xi, • • • ,X„). 

In all our calculations, log and exp are understood to the base e so that the underlying unit is a nat. However, 
we allow ourselves to interpret and discuss our results in bits by converting log to the base two. For any x G M, we 
define [x]^ = max(x, 0). 

For two distributions P, Q on some alphabet X, D(P||(5) = -^(®) is the Kullback-Leibler (KL) 

divergence between P and Q, and V(P, Q) = ^ l-P(x) — Q{x)\ is the total variation between P and Q. Pinsker’s 
inequality ensures that V(P, Q)^ ^ |D(P||(5), which we will loosen as V(P, ^ D(P||(5) for simplicity. We say 
that P is absolutely continuous with respect to (w.r.t.) Q, denoted P <C Q, if for all x G A P(x) = 0 if Q{x) = 0. 
We also denote P®*^ the product distribution Y\a=i ^ 

For the reader’s convenience. Table |T] also provides a summary of the notation often used throughout the paper. 

III. Covert communication over noisy channels 

We consider the situation illustrated in Fig. [TJ in which two legitimate users, Alice and Bob, attempt to 
communicate over a DMC (A,lTV|x,A) without being detected by a warden, Willie, who observes the signals 
through another DMC (A, Wz\x^ ^)- The transition probabilities corresponding to n uses of the channel are denoted 
Wyjx — nr=i ^Y\x and IA|j^ = nr=i ^z\x- We also make the following assumptions. 







TABLE I 

Commonly used notation 
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X = {xo, xi} 

Channel input alphabet, with innocent symbol xo 


Indexed sequence with value in {0,1} 


Unly/n 

Po 

Channel output distribution Wy\x-xo 

Pi 

Channel output distribution Wy\x^xi 

Qo 

Channel output distribution Wz\x-xo 

Qi 

Channel output distribution Wz\x=xi 


Channel input distribution such that Tla„{xi) = 


Channel output distribution Pa„ = OnPi -I- (1 — a„)Po 

Qocn 

Channel output distribution Qa^ = ctnQi + (1 ~ OnjQo 

Mo 

Minimum probability in support of Qo, i.e., iiiinz:Qo{2)>o Qo{z) 


• There exists an innocent symbol xq ^ X that corresponds to the input to the channel when no communication 
takes place. In such a case, the distributions induced by xq at the output of the two memoryless channels are 

Pq 4 Wy\x=xo and Qq = Wz\x=xo ^ith Ho = min Qo{z). (1) 

z:Qo(z)>0 

• There exists another symbol xi S X with xi / a;o> and we define the distributions induced by xi at the output 
of the memoryless channels 


Pi^Wy\x=x, and Qi^Wz\x=x,- (2) 

Qi ^ Qo and Qi Qo, which ensures that the problem is not trivial, by excluding the situations in which 
Willie would always detect transmission with non-vanishing probability or would never detect it. As shown in 
Appendix Alice and Bob would then communicate zero or on the order of n covert bits, respectively. 

Pi ^ Pq, which guarantees that Bob does not obtain an unfair advantage over Willie, by excluding the situation 
in which Bob could identify the location of some uncorrupted xi-symbols. As shown in Appendix [G] Alice 
and Bob would then communicate on the order of y/n log n covert bits instead of y/n. 


The restriction to a single symbol xi 7^ xq eases the presentation of the results, but we shall see in Section VII-B 


that it incurs little loss of generality. We also discuss partial extensions of the results to AWGN channels in 
Section VII-D Although the absolute continuity requirements restrict the class of channels considered, they are 
nevertheless satisfied for large classes of channels of interest. For instance, for AWGN channels, xq = 0 is the 
natural choice of the innocent symbol, and the absolute continuity requirements are satisfied. 



Fig. 1. Model of covert communication channel. 

Formally, Alice’s objective is to transmit a message W uniformly distributed in [[1,M]] by encoding it into a 
codeword X = (Xi,..., Xn) of n symbols with the help of a secret key S uniformly distributed in [1, K}. At the 
beginning of every block of n symbols, Alice sets the value of a switch T: if T = 1, the output of the encoder 
is connected to the channel; else, if T = 0, the innocent symbol xo is sent n times through the channel. Upon 
observing a noisy version Y = (Yi,..., Yn) of X and knowing S, Bob’s objective is to form reliable estimates T 
and FF of T and W, respectively. Reliability is measured by the average probability of error 

Pen- = E5(P(FF / W\S, T = 1)) + P(f / 0|r = 0). 


(3) 
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In contrast, Willie’s goal is to perform a statistical test on his observation Z = (Zi,..., Z^) to decide whether Alice 
and Bob communicate (hypothesis Hi) or not (hypothesis Hq). The probability of Type I error (rejecting Hq when 
true) is denoted a, while the probability of Type II error (accepting Hq when wrong) is denoted /3. It is possible for 
Willie to design blind tests that ignore his channel observations, and that achieve any pair (a, /3) such that a + (3 = 1. 
Therefore, the objective of covert communication is to guarantee that Willie’s best statistical test yields a trade-off 
between a and /3 that is not much better than that of a blind test. Specifically, let Qg" = 11?=! Qo be the product 
distribution that is expected by Willie when no communication happens, and let be the distribution expected 
when communication takes place. It can be shown Il22ll that Willie’s optimal hypothesis test satisfies the tradeoff 

a + /3 ^ 1 — Y^D((5"^||Qg"'). Therefore, achieving covert communication amounts to ensuring that D((5”||<3o") is 
negligible. We provide further discussion of the role of D(-||-) as a measure of covertness in Appendix [A| 

Consequently, we aim to establish scalings of log M and log K with n for which there exist covert communication 
schemes with 

limPen' = 0 and lim D(g"||gr) = 0. (4) 

r )— i.rso n—irso ' ' 


A. Covert processes 

For n G N*, let an G]0; 1[. Define the input distribution IIq^ on {xo,a:i} such that nQ,„(a:i) = 1 —nQ,„(xo) = an, 
as well as the corresponding output distributions 

z\x{Ax)'^(iSx) = Qi{z)an + Qoiz){l - an), (5) 

X 

PoiSy) = '^P^Y\x{y\x)^aAx) = Pl{y)an + Pt){y){l - an). (6) 

X 

Also define the product distributions 

n n n 

= QZ = IlQ^r., and = (7) 

i=l 2 = 1 2=1 

Note that Qi <C Qo implies Qa^ Qo and that Pi ^ Pg implies P^^ <C Po- We then have the following result, 
whose proof may be found in Appendix [B| 

Lemma 1. Let {onln^i be such that an G]0; 1[ and lim^^oo ctn = 0. Let Qq and Qa„ be defined as per ([^ 
and ( 0 . respectively. Define for every integer k ^ 2 

{Qi{z) - Qo{z))'‘ -sp {Qi{z) - Qoiz))^ 

and p^{Qi\\Qo)= - nx.A-i -' 


x,(Qi||Qo) = 

zG2) 


Qo{z 


\k—l 


zGZ:Qi(z)-Qo{z)<0 


Qo{z 


Then, for any n G N*, 


©(QaJIQo) ^ ^x,(Qill Qo) - Qo) + ^X4 (QiII Qo) • 


For n large enough, 


at 


ro(Qa„||Qo) ^ ^X2(QiII Qo) - “n oX3(Qi|| Qo) - o^aCQill Qo) H — ^^4(Qi|| Qo) • 


2at 


(9) 


( 10 ) 


2 33 ' 

Finally, consider the joint random variables (X, Z) G {xq,xi} x Z with distribution Wz\x{A^)^oi„{P)- Then, 

I(X;Z) = a„D(Qi||Qg)-D(Q„J|Qg). (11) 


Remark. The inequalities Q and ( |jOp may be loosened for n large enough as 
a^ 

-^XaCQill Qo) (1 + v^) ^ ID)(Qo,^IIQo) ^ -^X 2 (QiII Qo) (i - V^) ■ (12) 

These bounds are not tight, and one may exhibit distributions for which the inequalities are strict. Nevertheless, this 
allows us to obtain the correct first order and second order in an o/I(A; Z), which is all we use in the remainder 
of the paper. The bounds also allow us to circumvent the rather painful Taylor series o/I(X; Z) in an- 
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For the specific choice an — ^ with a;„ = o(l) n u}{l/^/n) as n —)■ oo, i.e. 

lim Un = 0 and lim ujn\/n = oo, 

n^oo n^oo 




we have 


(13) 


lim D(g®^||gn = lim nD(g„J|go) = 0, 

i—¥oo ^ ^ ' n^oo 


(14) 


so that g®" becomes indistinguishahle from g^^^; therefore, we call the process g®” a “covert stochastic process.” 
In addition, the realizations of the input process IIq,^ contain an average of LOn\/n realizations of the xi symbol, 
which grows to infinity with n; this opens the possibility of embedding information symbols in the channel input 
while remaining covert. Essentially, the result of Lemma formalizes the intuition that the change in the distribution 
perceived by the warden is indistinguishable from statistical noise as long as the number of xi symbols transmitted 
in a sequence of n symbols does not exceed ^/n. The fact that a stochastic process with a non-trivial number of xi 
symbols may induce an undetectable covert stochastic process at the output of a noisy channel, suggests a generic 
principle for the design of covert communication schemes, which we formulate as follows. 


Covert communication schemes should attempt to simulate a covert stochastic process Q^- 


The covert communication schemes developed in 
of this principle. 


Section IV and Section |V] correspond to different applications 


B. Technical digression: concentration inequalities with low-weight sequences 

One of the technical challenges faced when trying to deal with stochastic processes such as If®'* in (|7]l, is that 
the naive concentration inequalities traditionally used to develop information-theoretic results do not seem to apply 
here. To be more concrete, consider the joint random variables (X, Z) G T"" x Z"' with the product distribution 
nr=i '^z\x{^i\xi)^ar^{xi)', define the mutual information random variable llll 


log 


QTS'Z^) 


1=1 


Wz\x{Z,\Xi) 

g«„ {Zi) 


(15) 


whose average is the average mutual information I(X; Z) = nl{X;Z). Assuming for simplicity that the range of 

Hoeffding’s inequality states that for any /r > 0 


log is a finite interval of length ?/ > 0 


log 




(Z|X) 


Qr(z) 


- nI(X; Z) 


^ nqI{X-,Z) ^ 2 exp 


-2nqH{X;Zy 


(16) 


Unfortunately, this upper bound does not vanish because of the specific scaling of I(X; Z) with n given in ( [TT] | of 
Lemma [T] The problem finds its roots in the “low weight” of the sequences X, i.e., the number of xi symbols is 
on average on the order of ujn\/n, which is sub-linear in n. 

There are, however, some concentration inequalities that are still useful and that will be exploited in virtually 
all subsequent proofs. For instance, consider a binary random sequence S G {0, 1}”^ with a product distribution 
nr=i such that ^ 5 ( 1 ) = 1 — -Ps(O) = The sequence S is of low average weight ujns/n, but the application 
of a Chernoff bound ll^ Exercise 2.10] yields for any q G]0; 1[ 


i=l 


— Ur 


> ^ 2exp ^ns/n '^ 


(17) 


which vanishes with our choice of a;„. The difference between ( fTT] ) and ( [T^ may be intuitively understood as 
follows. The number of terms contributing to (fT7|) is on average uin\/n because most terms are zero. 


'The choice of Un will eventually control a tradeoff between the number of covert bits and their difficulty detection by the warden. To 
obtain a large number of covert bits, one would choose a large ujn, say 1/logn. In contrast, to make the bits harder to detect, one would 
choose a small a;„, say logn/yTi. 

^This holds if the channel {X, Wz\x, Z) is a fully connected DMC, such as a BSC. 
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In contrast, all the terms in potential contributors to the sum in ( fT^ ; the concentration 

inequality ( [T^ fails because the individual contributions of the terms in the sum are too small. 

We note that an alternative approach to address this technical challenge would be to use more powerful concentration 
inequalities, such as Bernstein’s or Bennet’s inequalities. We do not pursue this approach here and we rely instead 
on the definition of suitable typieal sets, whieh might be of independent interest. 

IV. SOURCE-RESOLVABILITY BASED COVERT COMMUNICATION 

In this section, we revisit the architecture for covert communication proposed in ||2l, which operates with a secret 
key S of on the order of ^/n log n bits and allows the transmission of on the order of ^/n bits over n channel uses. 
The main result developed in Theorem [T] is a reinterpretation of the seheme in O from the perspective of souree 
resolvability. For clarity, we assume here that T = 1 and no attempt is made to optimize the various constants 
appearing in the analysis. An optimal scheme handling the general case is presented in Section [V] 



Fig. 2. Covert communication scheme adapted from (3. A secret key is used to create spreading sequences in time, which are undetectable 
by tbe warden. Information is transmitted to the legitimate receiver by modulating the spreading sequences. 

The communication scheme illustrated in Fig. is an adaptation to DMCs of the scheme proposed in ||2l 
for AWGN channels, and it operates according to the following general principle. 

1) Alice and Bob split the secret key S mto two keys 5 G [1, iTj ai^ S G [1, K\ such that KK = K. 

2) Alice and Bob spread the secret key S into a length n sequence X G {xq, 

3) Alice encodes the message W into a length n' binary codeword B G {0,1}” , where n' on the order of tCn-y/n 
will be exactly specified later. 

4) Alice transmits information by modulating the symbols of X in the position i for which Xj = xi, resulting in 
a transmitted sequence X. Formally, consider realizations x, b, and s, define 

supp(x) = |i G [l,n]] : Xj ^ xo| , (18) 

and let {ij] with j G [1, supp(x)]] be the positions for which Xi. / xq. The symbols of the modulated 
sequence x are defined as 

{ Xfe.©5^- if G |ll,min(supp(x),n')l such that i = ij 

xg- if G [[min(supp(x), n'), supp(x)]] such that i = ij (19) 

Xq Otherwise. 

Effectively, the modulated sequence X is obtained by transmitting the sequence X through a memoryless 
Z-channel, in which the xq symbol is unaffected and the xi symbol is flipped to the xq symbol with probability 
We denote the transition probability of this Z channel by and we let 

Wz\xiz\x) = '^Wz\x{z\x)Wj^^^{x\x). 


X 


( 20 ) 
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5) Upon observing the channel output Y, Bob uses his knowledge of X to create a sequence Y = ( ,..., 


supp(X) 


If supp(X) < n', Bob declares an error; otherwise, it attempts to decode Y with S to form an estimate W of 
lU. 

we first attempt to simulate the process Q® 


Following the principle outlined in Section 


III-A 




by simulating 


the process IT®” defined as per Q at the input of the channel. Specifically, the secret key S is encoded into a 
sequence X G {xo,a:i}"^ such that the distribution of X is close to The following theorem characterizes 
the performance of this covert communication scheme. 


Theorem 1. Consider a discrete memoryless covert communication channel with Pi <C Pq, Qi 'C Qo> (^nd 
Qi 7^ Qo- C be the capacity of the main channel with inputs restricted to {xo,a:i} and let Pn = with 
u)n G o(l) n as n ^ oo. For any ^ g]0; 1[, there exist ^1,^2 > 0 depending on Wy\x f^ot on Wz\x> 

and a covert communication scheme as in Fig. ^ such that, for n large enough: 

log M = {1 - ^)u:nVnC, logK = {1 + ^)uJnVnlogn, (21) 


and 

F{W / lY|r = 1) ^ |lD)(Q’^||Qr) - I ^ (IT) 

This scheme is universal w.r.t. the warden’s channel, in the sense that is bounded for n large enough 

as soon as Qo)’ X 3 (Qi|| Qo). cind X 4 (Qi|| Qo) bounded, irrespective of the exact statistics Wz\x- 

Remark. With some extra work, one may prove that the key S is not necessary. Specifically, one can develop a 
random coding argument that includes the random generation of the code used by the legitimate users, and establish 
similar results without relying on a key S. We omit the proof, which is slightly more involved but does not affect the 
scaling in n. Also note that S acts as a one-time pad on the message, which guarantees that the message is kept 
confidential from the warden. 

Proof: Tte proof of Theorem consists in showing the existence of a deterministic encoder to generate X 
from the key S, and the existence of a codebook with blocklength approximately ojns/n to modulate X into X. 
a) Existence of spreading code: Let K £W, e g] 0; 1[, and define the set 

7^” = {x G Y” : (1 - e)ijJns/n ^ supp(x) ^ (1 + e)oJns/n] . (23) 

Generate K codewords Xj G {xo,xi}" independently at random according to the distribution 

1 fx G LO 

n” ,(x) ^ (x) with an = ^ and A„ ^ (X G Tf) 

An s/n 

Using a Cbernoff bound, we have 

1 - A„ = Pn- (X i Tfi) 

Finally, define the output distribution corresponding to ^ as 

X 

The encoder spreads a secret key s G [[1,P]] into a sequence x G {xo,xi}"' according to the map [[1,P]] —)■ 
{xo,xi}" : s I— 7 - Xg. The resulting spreading sequence distribution is then 

i=i ^ 

Our objective is to show that for suitably large K, tbe spreading sequence distribution P^ is close to tbe product 
distribution Tl®". This is actually a variation of source resolvability ll20l . which we detail to carefully handle the 
dependence of IT®” on an- As shown in Appendix]^ the average of D(Pj^||n®”) over the random code generation 
satisfies the following. 


(24) 

(25) 

(26) 




Lemma 2. For any 7 > 0 and all n G N* large enough, 


E(D(Px||n«’^)) ^ ^log f—) Pnr (supp(X) ^ 

-^n \Oln / " \ 


7 + nlog(l - On) 


log 


1 — an 




(28) 


For any ^ > 0, by choosing 

7 = (1 + ^j.)uJnVnlog ( — - 1 ) - nlog(l - an) (29) 

V«n / 

and noticing that supp(X) = Y^^=i l{Xi = xi\ with Enj" (supp(X)) = ujn\/n, we obtain with a Chernoff bound 

7 + nlog(l - an) 


Pn- supp(X) ^ 


log 


1 —CKn 


= Pn- (supp(X) > (1 + n)ujnVn) ^ e 


With an = ^ as per (131, notice that 


7 = (1 + fi)uJnVnlog (— - 1 ) - nlog(l - an) ^ (1 + n)uJnVn {logy/n- logWn) + 

\ J 


nuj<n 


(30) 


(31) 


y/n-UJn 

where we have used the inequality log(l + x) ^ for x g] — l;oo[. For n large enough, we also have 

logV^-logWn < logn by (131 and ^/n-ujn^ lhat 7 ^ (l + 2 /r)a;n\/nlogn and log ^ ^ log 2 +log n. 

Hence, choosing 

logiF = (1 + (5)(1 + 2^)a;n-\/nlogn with any <5 > 0, (32) 

we obtain for n large enough 

E(D(Pj^||n®”)) ^ some appropriately defined p> 0. (33) 

In particular, there exists a specific code for which 

D(Px||n*”) ^ and V(Px,H*”) ^ (34) 

where the bound on V(-, •) follows by Pinsker’s inequality. 

b) Ejfect of modulation: Irrespective of the error-control code used to encode W, modulation requires at most 

logX = (1 + e)a;nVn (35) 


by the constraint imposed in (231, which is negligible compared to logiF in (32i. When presenting the distribution 
Ila„ at the input of the Z-channel induced by the modulation, one may check that the corresponding 

distribution at the output of the Z-channel is with /3„ = Consequently, we have by the data processing 
inequality 


Next, notice that 


^©(^xlina!) and v(Q^g^”) ^v(Px,n«-). 


Kg^iigr) + 


Qr(z) 




with 


y; (Q^iz) - QX(z)) log 


(g"||Q7) + D{(37||(3r) + E (Q"(^) - QTF)) 'oifSrF 

QT» 


Qr(z) 




nV(Q", QT) log — ^ ne-5^‘^"^log 
ho 


1 

Fo' 


(36) 

(37) 

(38) 

(39) 


Hence, combining ([34|)-(39 1, we conclude that there exists a constant ^2 > 0 such that, for n large enough, 

l®(g”iiQr) - ®(Q^!iiQr) I ^ (40) 
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c) Reliability: We conclude the proof by showing how one may encode the messages W into codewords B. 
Assume that the main channel has capacity C when inputs are restricted to the set {xq, xi}. Standard arguments 
show that, for any <5 > 0, there exists a binary code of length (1 — e)iOnVn, such that one may choose 


logM = (1 — (5)(1 — e)oJn\/nC 


with probability of error 


Pen ^ ^ 




(41) 


(42) 


where > 0 depend on 6, e, and Wy\x- 

Combining the choice of log A', log AT, and logM, in (32i, (351, (411, with the bounds obtained in (40i and (42i, 
one may then find the appropriate constant ^ promised in the statement of the theorem. ■ 

The interpretation of Theorem [1] is the same as in l|2l Theorem 1.2]. However, our underlying covert communication 
scheme is slightly different, as the key S may be viewed as the seed to generate a “spreading sequence,” rather than 
a way to index the positions for transmission. Technically, the result also differs from |5j by ensuring a bound on the 
maximum key size instead on the average key size, although it is still on the order of yTr log n. Finally, Theorem 
relies on more sophisticated resolvability techniques llU, whose usefulness will become apparent in Section [V| 
From Theorem [T] one may now attempt to establish asymptotic limits akin to capacity. Unlike traditional 
information theoretic problems, there seems to be no strong converse and the factor uon that controls the decay of 

D((5"’||(5o”) also affects logM. Consequently, following the approach of p5l, logM is scaled by 
to obtain a meaningful asymptotic constant. 

Corollary 1. Consider a discrete memoryless covert communication channel with Pi <C Pq, Qi 'C Qo, and 
Qi 7^ Qo- Let C be the capacity of the main channel with inputs restricted to {xo,xi}. For any ^ g]0; 1[, there 
exist covert communication schemes such that 


lim D(g’"||gr) = 0, lim F{W ^ W\T = 1) = 0, 

-r)_' ' 'n _ 


and 


lim 

n^oo 


log M 


nD(Q"||Qr) 


= 2(l-0i 


II ^ s C, lim 
XniQlWQo) 


log AT 


= CX). 


nD(Q"||Qr) 


Proof: Consider a sequence of coding schemes as identified by Theorem [T] for some ^ > 0. Then, using the 
remark after Lemma [1} we have 

/ / /.)„ \ , ^ 

(43) 


©(Q'llQD ^ nD(g^JIQo) + ^ Qo) 

0(Q"IIQr) ^ nBiQpJQn) - ^ X,(Qi|| Qo) (l - 




(44) 


Hence, lim^^oo ®(g”||go”) = 0> using the constraints on Un as per (13), we obtain 

logM ^ (l-^)t<;n^C 


lim 

n^oo 


nD(g"||g«-) 


> lim 


Wn\/ra^|x 2 (Qi|| Qo) (1 + ^) + 


= 2, 


X2(QiII Qo) 


(1-OC, 


(45) 


lim 

n^oo 


logM 


^ lim 


(1 - i)uJns/nC 


nD(Q"||Qr) 


‘^nv/ray^|x 2 (Qill Qo) (1 - ^) - 4® 


= 2 , 


X2(Qi|I Qo) 


(1-Oc. 


(46) 
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V. Channel-resolvability based covert communication 

The covert communication scheme analyzed in Theorem requires a key size on the order of w^-y/nlogn bits to 
transmit on the order of ujn\/n bits. In a practical implementation of the coding scheme, the key is likely to stem 
from a pseudo-random number generator, which opens the proposed scheme to attacks that could get particularly 
detrimental as the required key gets longer. Fortunately, we show next how the scheme may be suitably modified to 
use a key size on the order of ujn'/n bits. The idea behind the improvement is to use the key S to help directly 
simulate the covert process Q®” defined as per Q, wifhouf simulafing fhe process H®”. Concepfually, fhe idea is fo 
rely on channel resolvability in place of source resolvability, buf fhe precise analysis requires some care because of 
fhe “low weighf” nafure of fhe process 11®”. The use of channel resolvabilify also enables one fo improve upon fhe 
value of log M in Theorem The proposed architecfure is illusfrafed in Fig. The key S is used to select one of 
K codebooks, each containing M codewords for encoding message W. The underlying idea is then to guarantee 
that each codebook is sufficiently small to ensure reliability over the main channel while ensuring that there are 
sufficiently many distinct codewords overall to keep the warden confused. 



Fig. 3. Channel-resolvability based covert communication. Tbe key S is used to select one of K possible codebooks. 


Theorem 2. Consider a discrete memoryless covert communication channel with Pi <C Pq, Qi <C Qo, and Qi ^ Qq. 
Let an — ^ with Un £ o(l) H as n ^ oo. For any ^ g]0; 1[, there exist ^2 > 0 depending on Wy^x, 

Wz\X’ tind a covert communication scheme as in Fig. such that, for n large enough, 

logM = (1 - 0‘^nVnO{Pi\\Po), 

logK = LOnV^ [(1 +OE)(Ql||Qo) - (1 - 0®(A||Po)]+ , 

and 

Perr ^ |D(Q” ||Q®”) - D(Q®” ||Q®”) | ^ 

Remark. The proof of Theorem actually shows an exponential concentration result, in the sense that a randomly 
generated codebook satisfies the reliabilty and covertness conditions with probability at least 1 — for some 

9 > 0. In some cases, it is possible to strengthen the result and show a super-exponential concentration result jj^, 
in the sense that a randomly generated codebook satisfies the reliability and covertness conditions with probability 
at least 1 — for some 9 > 0. 

Notice that no key is needed if D(Pi||Po) > ®(Qi||Qo) by choosing ^ small enough, in which case a single 
codebook (K = 1) is sufficient to achieve both resolvability and reliability simultaneously. In contrast, when 
D(Pi||Po) ^ ®(f5i||Qo)> the proposed scheme requires a key to achieve covert communication. 

Proof: The proof of Theorem is essentially a random coding argument for channel reliability Il25]l and channel 
resolvability ||26l ; however, because the number of bits communicated is on the order of Uny/n over n channel uses, 
naive concentration inequalities do not seem to apply directly. The idea we exploit to circumvent this technical 
issue is to use suitably modified fypical sefs. 
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d) Random codebook generation: Let M,K G N*. Generate MK codewords Xij G {xojici}” with i G [[1,M]] 

and j G independently according to the product distribution H®”. Define the sej^ 

r VL® 7 ^(y|x) 1 

yi:;^|(x,y)eA-"xy":log^P^>7| (47) 

where 7 > 0 will be determined later. The encoder simply maps a message i and a key j to the codeword Xjj. The 
decoder, which has access to y and the key j, operates as follows: 

• if there exists a unique i G [1, MJ such that (xjj, y) G A^, output T = 1 and W = i', 

• if there is no codeword i such that (xjj,y) G A^, declare there was no communication and T = oj^ 

• otherwise, declare a decoding error. 

e) Channel reliability analysis: As shown in Appendix [P] the probability of decoding error Pgrr averaged of 
the random codebook satisfies fhe following. 


Lemma 3. For any 7 > 0 , 


il®;V(y|x) 


E {Perr) ^ ( log p^n^Y) ^ 7 j + ^(1 + exp (a;2 - 1 ))) with C = Y1 


Pi{y? 

Po{y) 


(48) 


We now analyze fhe firsf ferm on fhe righf-hand side of (48 1 more precisely. Since Wy'^x n®” are producf 
disfributions. 




^ kL®" (Y|X) \ 

log —^ 7 = 


^o""(y) 


* Y\X^ 




vi=l 


Pt:{Yi) 


(49) 


If Xi = xq, note that Yi is distributed according to Pq and that log = 0. Similarly, if Xi = xi, Yi is 


distributed according to Pi and log = log Consequently, although the sum in (|49|l contains n terms. 


only those for which Xi = xi contribute to it. Therefore, we introduce the random variable L = 1 

so that 


Ely®" n® 




y2=l 


Po(Yi) 


7 ^ =Ei,^ 


PvF®? n®' 




V 2 = 1 


PoiYi) 


L 


E4pp.7y:iog:^<7 


V 2=1 


Po{Yi) 


L 


(50) 


Let //, o g] 0; 1[ and set 

7= (l-/x)(l-i/)a;„V^D(Pi||Po) and ^ {i e N* : £ > {1 - ti)uJnVn}■ (51) 

Intuitively, exp 7 represents the number of codewords in a codebook while C” represents the likely support size of 
the codewords. Then, 

/ / L 


EjEpxJ J]log|^^7 


V 2=1 


Po{Yi) 


l \ ^ J]P(L = ^)Pp.( J]log 

/ teC" 


V 2=1 


PiiY) 

Po{Yi) 


Since E(L) = X]r=rlll'(l ~ ^ 1 }) ~ ^ns/n, we obtain with a Chemoff bound 

P(L ^ C^) = P(L ^ (1 - //)E(L)) ^ 


For £ G C”, we have 


(1 — p)(l — u)ujnV^ — £ < (1 — v)£ — £ = —i'£ 


^ 7 j+E(L^C;:) (52) 

(53) 

(54) 


^The traditional typical set for decoding is similar to A" but with F®"(y) in place of Po"iy) 1251 . This amounts to using the information 
density in place of the relative entropy density. 

"'since the scheme sometimes allows keyless operation (K = 1), the decoder must be able to identify the absence of transmission without 
relying on a key. 
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SO that 




V 2=1 


V 2=1 


PAYi) 

Po{Yi) 

PiiXi) 


-®(Pi||Po) ^7 -^E)(A||Po) 




(It '“"WD 

^ Ae~°‘^ for some constants a > 0 


-m{Pi\\Po) ^ -i^miPiWPo) 


(55) 

(56) 

where the constants A and a are obtained using a concentration inequality, such as Hoeffding’s inequality]^ Combining, 
([50|)-(|5^ with d^, and substituting in ([48]), we obtain 


Hence, if 


we obtain 


lE(Perr) ^ + Me"^ (l + . 

logM = (1 — 5)(1 — /x)(l — i/)a;n-v/n®(-Pi||-Po) with 6 g]0; 1[, 
lE(-ferr) ^ _j_ g<^^(C-l)^ 


(57) 

(58) 

(59) 

(60) 


For n large enough, with the choice of ujn in (131, 1 + exp (u;^(C ~ 1)) ^ e so that 

lE(-Perr) ^ ^-pi^n^/n gQjjjg appropriate choice of pi > 0. 

f) Channel resolvability analysis: The objective is to show that the distribution 

M K 

Q”W = EE'‘irx("N)M7,- 

i=l j=l 

induced by the codebooks is close in divergence to Q^^{z.). The proof largely follows that of lfTSl . with the appropriate 
modifications. As shown in Appendixthe divergence 0(^Q‘^\\Q^'^ averaged over the random codebook satisfies 
fhe following. 


Lemma 4. For any r > 0 and for n large enough, 


E(D(Q"||Q®")) ^ nlog-Piy». nr„ log 


Fo 


fF|p^(Z|X) 

Qr(z) 


^ T + 


MK 


Note that 


fL||C(Z|X) 


Pw®" n®'* I los-^— 7 —v 

^ 1 ^ I ^ Qg"(Z) 


=PH/-^nr f 


v2=l 


Wzix{Zi\Xi 

Qo{Zi) 


^ T 


( 61 ) 


(62) 


As in the channel reliability analysis, if Xi = xq, note that Zj is distributed according to Qq and that log = 

0; if Xj = xi, then Zi is distributed according to Qi and log = log qI\z"\ ■ Honce, we may proceed as 

earlier, by introducing L = XlILi = aq} and defining 

r = (1 +/i)(l + i/)w„V^D(Qi||(5o) and = {£ e N* : \i - uJnVn\ < poOnV^} ■ (63) 

^log is bounded under our assumptions. 
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Intuitively, exp r represents the total number of codewords while T>^ is their likely support size. The set differs 
from by requiring a double-sided bound, which captures the idea that the support of codewords should not be 
too small for reliability but not too high either to remain covert. Then, 




V 2 = 1 




i&v- 


K 2 = 1 


with, using a Chernoff bound. 


For I G 2?”, note that 


>(L ^ P”) = P(|L - OJnV^\ ^ I^COnV^) ^ 

(1 + /r)(l + v)(jJnVn — 2 > (1 + v)l — i = vl, 


so that 




y2=l 
^ I 




Qx{Zi) 

Qo(^t) 




-2D(Qi||Qo) ^r-®(Qi||Qo) 

-m{Qi\\Qo)^m{Qi\\Q^y\ 


^ Be for some constants B,b > 0 


(64) 


(65) 

( 66 ) 


(67) 

( 68 ) 

(69) 

(70) 


where the constants B and b are again obtained using Hoeffding’s inequality. Combining, the inequalities 
with Lemma and choosing 


-(701 


logM -b logTF = (1 -b (5)(1 -b /u)(l -b i^)uJnVnO{Qi\\Qo), 


(71) 


we obtain 

E(D(Q”||Q®'^)) ^ nlog— ( 72 ) 

" Fo ^ 7 

Hence, for n large enough, 

E(D(Q"^||Q®”)) ^ fQj. some appropriate choice of p 2 > 0. (73) 

g) Identification of specific code: Choosing p,o,6, logM, and logiF, to satisfy both ( [5^ and ( [7T] |, Markov’s 
inequality allows us to conclude that there exists at least one specific coding scheme with n large enough and 
appropriate constants Ci > Ps > 0 such that 


Perr ^ and D(Q"||Q®") ^ 

In particular, Pinsker’s inequality also ensures that ^ Next, notice that 


)(Q-iiQr) =iD)(Q"iiQa:) 


QZ(z) 

QTi^) 


{<3"ll<3r) + iD>{er ll«~) + E («”(^) - ‘“8 sSirr 

^ VO 

QTA^) 


and 


J](Q»-Q®::(z))log 




^ 2nV(Q’",Qr)log— ^ 2ne-5^^‘^"^7^1og—. 

do do 


(74) 

(75) 

(76) 

(77) 


Hence, combining ([74|)-([77]), we conclude that there exists a constant ^2 > 0 such that, for n large enough. 
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The statement of the theorem is finally obtained by setting ^ ^ ((1 + 5)(1 + /4)(1 + — (1 — J)(l — //)(! — i')). 


Remark. A closer inspection of Appendix [D| shows that Lemma^applies to continuous channels. The concentration 
result follows with any condition that guarantees a concentration result for the sum of n i.i.d. realizations of 
log In particular, the concentration follows directly if log is sub-Gaussian / |23]/ . The adaptation of 

Lemma |4] to continuous channels is discussed in Section IV7/-DI 


As in Section IV one may also characterize the asymptotic scaling of log M and log K for the proposed scheme. 


We shall see in Section VI that the scalings of the message and key size are optimal. 


Corollary 2. Consider a discrete memoryless covert communication channel with Pi <C Pq, Qi Qo, and 
Qi 7^ Qo- Por any ^ g]0; 1[, there exist covert communication schemes such that 


lim D(g"||Qg") = 0, lim P,,, = 0, 

i^oo ^ ^ n^oo 

log M 


lim 

n^oo 


lim 

n^oo 


nD(Q”||Q«-) 

log AT 


= (i-0i 




X 2 iQt\\ Qo) 

[(1 + ^©(QillQo) - (1 - eMAllAo)]- 


/nD(Q”||Q®") X^iQiWQo) 

Proof: The result follows as in the proof of Corollary [T] and is omitted for brevity. ■ 

VI. Converse result for DMCs 

In this section, we show the optimality of the asymptotic limits given in Corollary The proof leverages the 
converse technique and results of 0], lO, ll27l . 

Theorem 3. Consider a discrete memoryless covert communication channel with Pi <C. Pq, Qi <C Qq, and Qi Qq. 
Consider a sequence of covert communication schemes with increasing blocklength n characterized by = Perr 
and dn = D(Q"-||(5o")- Cn = linin^oo dn = 0, we have 


lim 

n^oo 


log M 




l'nB[Q^\\Ql'^) ]j X^iQiWQo) 

For a sequence of schemes such that \78\ holds with equality, we have 


©(AllAo). 


(78) 


log M +log AT 
lim — , ^ 

n^oo 


y^nD(Q''||Qg”) ]] X^iQiWQo) 


O(QillQo). 


(79) 


Proof: The proof of (781 is an adaptation of ||5l Proof of Theorem 2]. The proof of (791 follows by adapting 
the steps of ETl Section 5.2.3] to lower bound the sum logM + log AT. We detail here the modifications required 
to analyze the present setting. 

Consider a sequence of length-n codes for the setting in Fig. 0 with Cn = Pen and dn — D((5”||Qo”)’ 
lim„_>.oo Cn = lim„_ 5 .oo dn = 0, and logM takes the maximum value such that lim„_>.oo logM = oo. We start by 
upper bounding log M using standard techniques. 


log M = H(1F) = I(1F; V’^A) + H(lV|y’"A) 

^ I(1V; Y^S) + Mfe (cn) + en log M 
= I(FF; y"|S) + Hfe (en) + e^ log M 
^ liWS; Y'^) + Mfe (en) + en log M 

^I(X^;y'^)+Hfe (en) + enlogM 

^ nI{X ; y) + Hfe (en) + en log M, 


(80) 

(81) 

(82) 

(83) 

(84) 

(85) 
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where the random variables X and Y have distribution 


Hence, 


Following ll271 . we obtain 


'^PxAx) 

i=\ 

and P^y-(x, y) = P^{x)WY\x{y\x)- 

(86) 

logM ^ ^ 

— (nI(X;y)+Hb(e„)). 

(87) 

logM + 

logK ^ H(1U5) 

(88) 


^ I{WS-,Z^) 

(89) 


(a) 



^ I{X^;Z^) 

(90) 


{b) . 



^ nI{X] Z) - 5n, 

(91) 


where (a) follows because X^ is a function of W and S, and (b) follows by the steps of ETl Section 5.2.3] upon 
defining the random variables X and Z to have joint distribution 

^xz(^’ - ^x(x)^zix(zlx). (92) 

Following the reasoning of Q, ll27l . one can show 

~ 1 ^ 

Sn = ©(Q^IIQr) ^ nDiQWQo) with Q{z) ^ -Qi{z). (93) 

i=l 

Applying Pinsker’s inequality, we see that lim„_ 5 .oo V((5, Qo) = 0 so that Vz lim^^oo Q{z) = Qo{z), and Pj^ must 
be of the form 


Px(x) = (1 — //n)l = xq} + /i„l {x = xi} with lim = 0. 

Using the notation of Section we may write = Fl^^, Q = Q^„, and Py = P^„. Using the bounds given in 
Lemma [T] we find that 


2 2 
YxMW Qo) oiQf.JQo) ^ Qo) (i + , 

1{X-Y) ^ ^InD{Pl\\Po)-D{P^^JPo) ^ ^^nD{Pl\\Po), 
I{X-Z) = ^inD{Qi\\QQ)-D{Q^JQo). 


(94) 

(95) 

(96) 


Note that the lower bound of D((5^„ IIQo) in (94 1 combined with the inequality in (931 imposes that lim^^oo -vA^Fn = 
0. The constraint lim^^oo log M = oo combined with (|87]) and ([95]) also requires that lim„_>.oo n^n = oo- Hence, 


log M 




nl{X- Y) + Hfc (e„) 


HnO{Pi\\Po) + (e,; 


(1 - Qo) (1 - VJP^) 

I 2 D(Pi||Po) + ^ 

V x,{Qi\\ Qo) (1 _ 


and 


lim 

n^oo 


log M 


y'nD(Q”||Q") ^ V ^2(<3 iII Qo) 


0(Pi||Po). 


(97) 
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For any sequence of codes such that (97 1 holds with equality, which we know is indeed possible from Corollary ( [87] ) 
combined with (|95|) and lim„^oo = cc impose that for any /? > 0, 


(1 -P)^ 


XoiQiW Qo) 


D(Pi||Po) ^ lim 


nfinD{Pi\\Po) 


nD(Q"||Q^) 


Hence, combing (|9T]), (|9^, (98), we obtain for any p > 0 


logM + logiF ^ nUnniQiWQo) - nO{Qij,J\Qo) - S„ 

- 


nD(Q"||g«-) 


nD(g"||gr) 


^ (l-p )1 


XoiQiW Qo) 


©(QillQo)-—©(Q mJIQo)- — 


Since /? > 0 is arbitrary, we have 


logM + logi^ ^ 
lim — ■ ^ 


y^nD(Q”||Q®") X^iQiWQo) 


O(gi||go). 


(98) 


If VF^ix = W'yix’ the right hand side of (78) is actually a special case of 15] Theorem 2] for two inputs. 


VII. Extensions and applications 

A. Non-vanishing D(g”'||g®”) 

Instead of requiring that lim„_>.oo ©(g’^Ugg"^) = 0, we could relax the constraint by asking that lim„_>.oo ©(g^Ugg'^) 
6 for some chosen <5 > 0. The optimal scalings of log M and log K with n obtained in this case are summarized in 
Table |I^ and Table [^ respectively. The result when Qi ^ Qq and Pi <C Pq is obtained by choosing a sequence 
uin such that lim„_>.oo = oig > 0 in the proof of Theorem [^ The results for the other situations are obtained 
with the same modification in the analysis of Appendix [^ 

TABLE II 

Optimal scaling of logM for which lim„^oo D(Q"||Qf”’) = <5 > 0. 



Pi < Po 

Pl<ttPo 

II 

Qi ^ Qo 

Q{y/h) 

Q{y/h\og n) 

0 

QiVCQo 

0 

0 

0 

Qi = Qo 

e(n) 

e(n) 

0 


TABLE III 

Optimal scaling of logP for which lim„^oo D(Q"||Oo’”) = ^ > 0. 



Pi <C Po 

PiVCPo 

Pi =Po 

Qi ^ Qo 

©(Vn) 

0 

0 

QiVCQo 

0 

0 

0 

Qi ~ Qo 

0 

0 

0 


B. Multiple symbols 


A close inspection of the proofs shows that the calculations may be extended to multiple symbols 


eriAl 


such 


that Vi G Xi / xq- Specifically, assume fhaf each symbol Xi is assigned probability with J2i=iPi — 


Denote Pi = WYjx=Xi ^nd Qi = Wzjx=Xi- Following verbatim the approach of Section III-A one may redefine 


N 

Qa„{z) = Un'^PiQiiz) + (1 - Q;n)go(^), 

i=l 


( 99 ) 
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SO that for all n G N*, 

D{Qe.JQo)^^xJ^P^Q^ 


Qo) - 


Qo J + 


Qo 


( 100 ) 


and for n large enough, 

„2 


oiQaJQo)^^x,{^P^Q^ 


Qo] -al 


Xs 


'^PiQi 


Qoj -\% {^PiQi 

‘lat 


+ 


Qojj 
vA'^PiQi 


Qo 


( 101 ) 


Following the proof of Theorem we then obtain the following result. 

Theorem 4. Consider a discrete memoryles covert communication channel such that Qq is is not a mixture of 
{Q*}ie[i,Afl ^ II) Xf} Qi <C Qq and Pi <C Pq. Let {pi}ie[i,Af] G [0; 1]'^ be such that 'f2A=iPi = 1 

cxn — with (jjn G o(l) n as n ^ oo. For any ^ g]0; 1[, there exist ^ 1,^2 > 0 depending on lTV|x. 

Wz\X’ a covert communication scheme as in Fig. that, for n large enough, 

logM = (1 - O^nVn ^^PiD(Pi||Po)^ 

N 

(1 + 0 ^pMQiWQo) - (1 - 0 J]piO(Pi||Po) 


logiT = UnVn 


2=1 


and 


Perr ^ , |D || Q^) " ® (Qa! II | ^ 

In particular, we obtain a characterization of the asymptotic scaling. 

Corollary 3. Consider a discrete memoryles covert communication channel such that Qq is is not a mixture of 
{Q*}ie[i,AfI £ |[1)-^1 Qi Qo and Pi <C Pq. Let {pi}jg[i,iv] G [0; 1]-^ be such that 'f2^=iPi = 1- Then, 

there exist covert communication schemes such that 


lim D(g’^IIQg^) = 0, lim P,,, = 0, 

2—^■OO ^ 

log M 


lim 

n^oo 


lim - 

n^oo 


\ogK 


9 ^ 

X 2 

J2^=iPiQi 

Qo^ 

i 

2 

X 2 

Y.^=iPiQi 

Qo^ 

_ 


J]kD(P,||Po), 


Y^Pi (D(Qi||go)-E)(Pt||Po)) 


One can also show the optimality of the scalings by following Q, Q and adapting the proof of Theorem]^ 


C. Covert and secret communication 

The problem as formulated in Section only requires communication to be undetectable but does not prevent the 
warden from extracting information about the transmitted message. To address this, one could consider an additional 
semantic secrecy ll28ll constraint of the form 

ypw lim I{W-, Z”) = 0. (102) 

n^oo 

The problem is then similar to the effective secrecy introduced in ifTSl . ll29l in a regime of undetectable communication, 
and similar to the “hidable and deniable” communication setting in |[8l. 
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The architecture studied in Section already satisfies this condition because the modulation as per ( [T9| ) performs 
a one-time pad of the encoded message bits with the key bits S. If one does not wish to use an extra key for secrecy, 
then the next theorem shows that semantic secrecy may be obtained “for free” when D(Pi||Po) > ©(Qi||Qo)> by 
using a code for the wiretap channel instead of a code for reliable communication. 

Theorem 5. Consider a discrete memoryless covert communication channel with Pi <C Pq, Qi <C Qq, and Qi 7 ^ Qq. 
Let an — ^ with ojn G o(l) n co(^) as n ^ 00 . For any ^ g]0; 1[, there exist ?3 > 0 depending on 

Wy\X’ ^z\X’ ^ covert communication scheme such that, for n large enough, 

logM = (1 - ^)uinVnO{Pi\\Po), logK = (1 + ||Qo), 

and 

Perr ^ |D (Q" || - D || Q®”) | ^ ypwHW'Z'^) ^ . 

Proof: We only sketch the details here for brevity. Let ^ g] 0; 1[ and n G N* sufficiently large. 

If (1 — ^)D(Pi||Po) ^ (1 + 0®(QlllQo)^ we know from Theorem that we may transmit logM = (1 — 
^)ujn'/nOiPi\\Po) message bits with the help of uons/n ((1 + 0®(QillQo) “ (1 “ 0®(-Pill-Po)) key bits. One may 
render the message bits secret by performing a one-time pad requiring another (1 — ^)a;n-v/nID'(Ti ||^o) key bits, for 
a total of (1 -I-^)a;n-v/nD(Qi||(5o) key bits. 

If (1 — ^)D(Pi||Po) > (1 + 0 ®(Qi|IQo )5 we modify the random coding argument of Theorem]^ as follows. Let 
M,M' G N*. Generate MM' codewords Xjj G {xo,a:i}'^ with i G and j G |{1,M']]. The index i is used to 

encode a message W while j is used to encode another message W'. Following the exact same reliability analysis 
as in the proof of Theorem we conclude that if 

log M +logM' = (1 -Ownx/^0(Pi||Po) (103) 

then E(Perr) ^ g-pi<^r,y/n gQj^g y Q following the principle of achieving secrecy from resolvabilitv llTbl . we 
may also prove that if 

logM' = (l + Oa;n\/^0(Qi||Qo) (104) 

then E(I(IL;Z”)) ^ gome > 0. In addition, since logM + logM' ^ (1 + ^)uin'/nO{Qi\\Qo), 

covertness follows “for free” using the same arguments as in Theorem Finally, the bits of W' may be protected by 
a one-time pad, requiring (1 + C)^n\/nD{Qi\\Qo) key bits. The expurgation argument leading to semantic secrecy 
is standard, e.g., |[30l Lemma 1]. ■ 

The different regimes of covert and secret com munication are illustrated in Fig. which shows the asymptotic 
number of messages bits and keys bits scaled by ^ ^ ^ function of D(Pi||Po) for a fixed value of 

D(( 5 i IIQo)- As depicted by the different colors in Fig.^ the key bits may be used either for covertness or for secrecy. 
Similarly, some messages bits are intrisically covert and secret, while others require the use of a secret-key. For 
D(Pi||Po) ^ ©(QillQo)! secret-keys are required for both covertness and secrecy while for D(Pi||Po) > ©(QiIIQo)^ 
secret keys are only required for added secrecy. Irrespective of the regime, the total number of secret key bits 
remains the same. 


D. Gaussian channels 

Gaussian channels are of particular practical interest with the innocent symbol xq = 0. Lemma still applies to 
continuous channels but Lemma does not since po = 0. Nevertheless, one may establish a slightly weaker result 
in terms of the total variation. Since a + (3 ^ 1 — V(Q”, Qq"), it suffices to establish that Y(^Q", Qq") vanishes 
to ensure covert communications. As shown in Appendix one may adapt the proof of ll26l Theorem VII.l] to 
establish the following. 


Lemma 5. For any channel (X, Wz\x^ ^) and for any r > 0, 


E(V(Q^Q®::)) 


log 




(Z|X) 


g®-(z) 


i 2 V MK 


> T 


(105) 
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scaled number of 
message and key bits 


message 

message intrisically covert/secret 
key for secrecy 
key for covertness 



B((3il|Qo) 

Fig. 4. Illustration of different regimes of covert and secret communication. 


D{Pi\\Po) 


Consequently, one may establish the following result. 

Theorem 6. Consider a continuous memoryless covert communication channel with Pi <P. Pq, Qi <C Qq, and 

o iz') p 

Qi 7^ Qo- Assume that the random variables log with Z ~ Qi and log with y ~ Pi are sub-Gaussian, 

and f dy < oo. let an = ^ with ujn G o(l) n as n ^ oo. For any ^ g]0; 1[, there exist ^ 1,^2 > 0 

depending on Wy\X’ '^ZIX’ tind a covert communication scheme as in Fig. such that, for n large enough, 

logM = (1 - 0‘^n\/nD(Pi||Po) 

log K = OJnV^ [(1 + OWl IIQo) - (1 - miPl 11 ^ 0 )] + 


and 


Perr ^ V(g”, Q*”) ^ . 

For an AWGN channel, note that Pj ~ M{xi,a). One may check that log for y ~ Pi is sub-Gaussian 


Po(Y) 


Since 


®Po(F) <r2 


which is a Gaussian random variable. Also, 


Pi{y? 

Po{y) 


dy = e ^ < 00 . 


One can finally show covertness, by using the triangle inequality to obtain 

v(Q”,Qr) ^^{Q",QZ) +^{QZ^QT) ^ + 

As in the case of DMCs, no key is required if D(Pi||Po) > D(Qi||<5o)- 


(106) 


(107) 


(108) 
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Appendix A 

Kullback-Leibler (KL) divergence and hypothesis testing 

In this section, we provide a brief discussion in the spirit of to provide an alternative operational significance of 
the KL divergence for hypothesis testing. While KL divergence naturally appears in the exponents of the probability 
of false alarm and missed-detection when testing whether an i.i.d. process is generated according to one of two 
different distributions, this interpretation is not valid in the present setting since the distribution Q'^ is not i.i.d.. 

Nevertheless, assume that the warden’s hypothesis test is characterized by Type I error a, Type II error /3, and a 
rejection region TZ. This hypothesis test may be viewed as a “black box” that outputs a Bernoulli random variable Bp, 
indicating “0” if Hq is accepted, or “1” if Hq is rejected. If Hq is true then p = a by definition; alternatively, if Hi 
is true, then p = 1 — /3 by definition. The performance of the test may be captured by computing the Jensen-Shannon 
divergence l|32l 

Bi_j3) = = -D(i?i_^||i?Q,) — o(^B ^+i-i3 \\Ba^. (109) 

In fact, it is known thaj^O ^ ^ 1, with = 0 if and only if a+(3 = 1, and Bi^p) = 1 

if and only if a = /3 = 0. Hence, the value of S{Ba, is an indication of how effective the test is. 

To achieve covert communication, one must therefore ensure that is small. By application of the 

log-sum inequality, one obtains 


^ 0{Bi_p\\Bo 


= (1-/5)log^*—^+ /31og ^ 


a 


1 — a 






QTi^) 


= D(Q”||Qr)- 

Hence, a sufficient condition to make the test ineffective is again to minimize D(Q’^||(5g"). 


( 110 ) 

( 111 ) 

( 112 ) 

(113) 

(114) 


Appendix B 
Proof oe Lemma[I] 


Note that 


®(Q«„||i5o) = ^ {z) log = XI (^0(2) + an{Qi{z) - Qo(2)))log f 1 + a 


Using the inequality log(l + x) < x — ^ ^ fov x > —1, we obtain 


Qi{z) - Qq{z) 
Qo{z) 


(115) 


llz^ A / Ql{z) - Qq{z) al (Qi{z) - Qq^z) 

®(Qa„||Qo) ^ ^ {Qojz) + aniQijz) Qo{z))) I Qo{z) 2 i Qoiz) 

X { Qi(^) ~ Qo(^) 
3 V Qo(^) 

= Qo) - Qo) + Qo) > 


(116) 


®with a log to the base 2 


(117) 
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Using the inequalities log(l + x) > x — ^ for x ^ 0 and log(l + x) > x — ^ + ^ for x G [— 2 ; 0], we obtain for 
an small enough]^ 

0{Qc. IlOo) > (Qo(z) + a„(Qi(z) - Qa[z))) ^ j 


+ E (OoW + a-WiW-OoW))^ <1I8> 

2e^:QiU)-QoU)<0 ^ ^ 


a: 


1 


2 at 


= -fx,{Qi\\ Qo) + < Qo) - 2^3(^111 Qo )) + -^%iQi\\ Qo) ■ 

Finally, note that 

l{Px-,Wzix) = {I - anMQoWQaJ + anO{Q^\\Q, 

= (1 - an)n{Qo\\QaJ + ann{Qi\\Qo) + an'^Qi{z)log 

= anD(QillQo) — ®(Qo„||<3o) 

Combining (|117 1, (119l, and (120l, we obtain the desired results. 


(119) 


( 120 ) 


For any 7 > 0, define the set 


Appendix C 
Proof of Lemma[2] 


5” = <^ X G {xo,xi}” : log 


< 7 


( 121 ) 


n«”(x) 

For i G [1, K}, we denote the expected value over all random codewords Notice that 


E(D(Px||n«-)) =E 


n®«fx) 


= E 


K 


EE^iF = Xi}i°g - ^ -7 

„ ( J iFn®^(x) 


X i=l 


^ ^ {x = ii} E^i log 


i=l X X 


(“) ^ 1 


^EiEE na„,,(xi)l{x = iJlogE. 


[b) 


j=l X X, 

K 


eEi{x = x4 ' 

xn®”(x) 

A:n®”(x) 


E^EEnS ,(x.)i{x = x,}log f ^ 

^ ^ ^ ' I xnr(x) K n®’;j(x) 


i=l X X 


< En;.,,(*)iog 


1 


xn®” (x) 


< E EnS(x)iog(^^ + E) + E n:.,(x)iog 


x^5" 


iFII®” (x) 


+ — 


XScS" 


xn®” (x) 


( 122 ) 

(123) 

(124) 

(125) 

(126) 

(127) 

(128) 


^a„ should be such that Mz £ Z with Qo{z) > 0 and Qi{z) — Qof^) < 0 we have an(Qi{z} — Qo{z)) 7 —^Qo{z) 
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where (a) follows by Jensen’s inequality and (b) holds because E^j(l{x = Xj}) = ^(x) for i / j. If x G S^, 

we have 1 ^ n®”(x)e'^ and 


log 


+ 


i^n®” (x) 


^ log 


e^ n^Jx) 1 

Kn^’^(x) An 


^ = log 


K Ar, 


If X ^ 5” and for n large enough so that an < 1 — an and a^ ^ An, we have 


log 

Combining ( |I28 )-(I30l, we obtain 


1 


1 


it'll®"- (x) An 


( 


1 


1 


+ ^ ^ log — + ^ ^ nlog—. 


\a, 


Xr 


0^71 


E(D(Px||n*-)) ^ ^ log —Pnr„ (X ^ s;) +log(^ + ^ 

^ \ i\ '^n 


OLr 




The result follows by observing that 


(129) 


(130) 


(131) 


Pnr„ (X i s^) 




Pn®" log 


1 


- an)"—PP(X) 




7 


_ Q/ 

Pnj" { supp(X) log- - -n\og{l - an) 


OLr 


Pn- supp(X) ^ 


7 + nlog(l - an) 


log 


1 —Q:„ 



(132) 

(133) 

(134) 

(135) 


Appendix D 
Proof of Lemma [3] 

The result of the lemma could be viewed as a specific application of the k/ 3 bound ||33l. However, for clarity and 
completeness, we provide here a proof from first principles. From the definition of the encoder/decoder and a union 
bound, there are three error events to consider. 

• The codeword Xjj is transmitted but (xjj,y) ^ Al". 

• The codeword Xjj is transmitted but there exists x^j with k such that (xfcj,y) G 

• No communication happens but the decoder finds a codeword Xjj such that (xjj,y) G Al". 

Hence, we obtain 


M K 


E(Perr) ^ E ^ E E I^ ^ ^ i S.t. (Xfc,-,y) G Al-} 


MK 

y i=l j = l 

+ E|^E^o"(y)l{3i s.t. (X,,-,y)GAl"}j 
= E I^E W^y"x(y|Xii)l {(Xn,y) ^ Al>r ^ 1 s.t. (XH,y) G 
+ E(^Pr{y)l{3i s.t. (X,„y)GAl!;}^ 

^E(E^m(yl^ii)M(Xii,y)^Ai"}) +Y,^(Y.^Y'\xiy\^ii)^{i^ki,y) & a:;} 

V y / k^i \ y / 


+ E®^ E^o"(y)i{(Xi2,y)e-47} 


(136) 
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Note that the first term on the right-hand side of (1361 is 


/ h^«;v(y|x) ^ 

.nr„ 1 log p»^(Y) ^ ^ 


We now analyze the second term on the right-hand side of (1361. For any k ^ 1, 

E(vF®;V(y|Xii)i{(Xfci,y) G a;}] =^Y1 ^r(y)n^!y) e-^ 7 } 


X y 


EE^o"(y)na!:(x)^|^i{(x,y) 


(137) 

(138) 

(139) 


(a) P ^^(' v ) 

< €ye;} (mo) 

X y ^0 (yj 

/ P®"(Y)\ 

where (a) follows from PQ‘^{y) ^ VFy"^(y|x)e“'^ for (x,y) G A^. Since P®” and Pq'^ are product distributions, 
we have 

'pr(Y)' 


Eps 


p0n(Y; 


= nEp„ 


( Pc^sy) 

V PoiY) 


Next, note that 


E p 


PcSy) 

Po{Y) 


= Ep^^ il - an + a, 


Pijy) 

'Po{y) 


Piiy) 


= 1 - On -h a„ ((1 - an)Po(y) + OlnPliy)) „ . , 

\y Poiy) 

— 1 CTn “h Oin [1 On ~k On ^ ^ ^ ] 

V . / 

= 1 + an(C - 1), with c = x; 


(142) 

(143) 

(144) 

(145) 

(146) 


Consequently, 


/ p®"(Y)\ 

^PZ y pfn(Y) ) = (^ + (l + aliC - 1))) ^ exp {nol{C - 1)) = exp {ujI{C - 1)) . 


Hence, we obtain 


E l^y|V(y|Xii)l {(XM,y) G ,4!;} U e-^exp {ioUC - 1)) • 


Finally, the third term on the right-hand side of (1361 may be similarly bounded for any i by 

E(E^o”(y)M(x<i.y) eyl"}) = EEPo*”^""!^)! {(='.y) 6-^} 


X y 


^ EE^m(yl^)®"^n®-(x)l{(x,y) G^-} 


(147) 

(148) 

(149) 

(150) 


X y 




(151) 
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Define the set 


Appendix E 
Proof of Lemma H] 


r w^v^iz\x) 

= { (x,z) log . < T 


(152) 


QTi^) 

For {i,j) G [1, MJ x [1, K}, we denote the expected value over all random codewords {Xfc£}(fc/)6|i,M]x[i,A']\{(*j)} 
hy Notice that 


E(D(Q-||Q*-)) =E J]g-(z)log 


^ ^ Ef=iEf=iWirx(^ixM) 


Eiv^EE»'"x("|xoiog 


i=l j=l 


MKQl^Jz) 

fK 2^ 2^ 2^ [ iog MKQTSz) 

M 


XIK 


i=l j=l z 
M K 


< w E E E E «'irx(^i-ons(x.,) logE. 


MK 

{b) _L 
MK 


i=l j=l z 
M K 


Y.Y.Y.Y. log 


i=l j=l z 


~MKQ%^{z) 


' MK-l 

MKQ%^{z) ^ MK 


= EE»'“A'("l’‘)n"W‘°BI ■■•M .' +- 


(153) 


\MKQ%1{z) MK J’ 
where (a) follows hy Jensen’s inequality and (6) follow because E^jj^W||^(z|Xfc£)^ = Q^{z) for {k,i) ^ {i,j) 


If (x, z) G B!^, we have 


log (EliklfW + EEA) < lo+1) < «”W 


MXQ®"(z) MX 

If (x, z) ^ ;B", we have 

Mx-i 


MKQ%^{z) J^MKQ%^{z)' 


(154) 


log 


+ 


^ log 


+ 1 I ^ n log 


(1 CXji)fJ‘0 


MKQ'^Jl{z)^ MK j 

Comhining ( |153[ )-( [T55] ), we obtain 

E(D(Q-||Q®-)) ^ nlog—J] j;iE|f^(z|x)n®-(x)l{(x,z) ^ S"} 

{1 CXn)m ^ ^ 

+ EE»'“A>l-)nS(A)jg5f ^1{(A,Z) € B-;} 

z X ^a„\ / 

2 


(155) 


^ nlog- 


nr((X,Z)^e^) + 


{l-an)lio ^ ' MK' 

For n large enough so that 1 — ^ 1/2, we obtain the desired result. 
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Appendix F 
Proof of Lemma [5] 


We define 


M K 




i=l j=l 
M K 


(5®W = EE»'|.v(“l*«)i^i«’‘«.^) <t e?} 


i=l j=l 


SO that Also note that E(Q(z)) = Q^{z). Hence, 

E(||Q"-Q^:i|) ^ 2 ^k(|qW(z)-e(q(i)(z))|) + 2 J]e(|q(2)(^)_E 


z z 

The first term on the right-hand side of ( |158| ) is hounded as follows. 

1 A \ . 1 

2 


J]e(|qW(z) - e(qW(z)) \)^IY1 V^ar(Q(i)(z)) 


with 


M K 


Var 


z = 


E E M^Var(»y||',,(z|X.j)l{(Xy, z) € S; 

i=l j=l 

^ WarfH^|f^(z|Xn)l{(Xn,z) E 


< 


MK 
1 

MK 
1 

“ ~MK 
(a) 1 


MK 


®n-.-,(w'J|*A-{z|X)2l{(X,z) e e?}) 
5;M'|f,f(z|xfnS(x)l{(x,z) £ B?) 

X 

j;»F|l';,(z|x)Qr(zKn“{x)l{(x,z) £ s;} 




Q®-(z)’ 


where (a) follows because VF||^(z|x) ^ (5g"'(z)e'^ for (x, z) E Hence 

Je^ 


z) -E 


By Jensen’s inequality and the concavity of x i-A yx, we have 


< - Vw—*—Vo®"fz)^ IQZ(^) 


Q®’*(z) 2 V MX 


Qr(^)' 




(156) 

(157) 

(158) 

(159) 

(160) 

(161) 

(162) 

(163) 

(164) 

(165) 

(166) 

(167) 


so that 
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The second term on the right-hand side of ( |158[ ). 

i E (I Q(2)(^) _ E (q(2) (z)) I) ^ E (g(2) 


(169) 


M K 


Z j=l J = 1 Xij 


= iPtyj 


/ »'||*^{Z|X) ' 


(171) 

(172) 


Appendix G 

Special cases of channels 

In this Appendix, we discuss some special cases of channels that have been excluded hy the assumptions Pi <C Pq, 


Qi ^ Qo, and Qi ^ Qq, made in Section III 


A. Qi is not absolutely continuous w.r.t. Qq or Qi = Qq 

If Qi is not absolutely continuous w.r.t. Qq then ©(QiHQo) = oo. Hence, for any n G N* and any sequence 
X G {xq,xi}'^ distinct from the all-xo sequence, we have 

n 

0{Wz^lx-=JQr) = Y.^{Wz\x=xMQo) = oo. (173) 

i=l 

Consequently, it is impossible to transmit covert bits. 

If Qi = Qq, for any n G N* and any transmitted sequence x G {xq,x\}'^, we have |X"=xllQr) - 

the warden’s observations are independent of the transmitted signals and always have distribution Qq^. One may 
therefore use a standard error-control code for reliability over the main channel and transmit at non-vanishing rates 
approaching the capacity of the main channel. The corresponding scaling of logM is 0(n). 


B. Pi is not absolutely continuous w.r.t. Pq 

If Pi is not absolutely continuous w.r.t. Pq, denoted Pi<4^Po, define 

5^{?/G3^:Pi(?/)>OandPo(y)=0} and K^J]Pi(y). (174) 

y£S 

In other words, k is the probability that the symbol xi is identified without ambiguity at the channel output. We 
then have the following. 


Theorem 7. Consider a discr ete memoryless covert communication channel with Pi<3^Po, Qi <C Qq, and Qi ^ Qq. 
Let K be defined as per (174) and let an — ^ with ojn G o(l) n as n ^ oo. For any ^ g]0; 1[, there exist 

>0 depending on G ^Yjx> ^z|x> covert communication schemes such that, for all n large enough, 


logM = (1 -^)k ( - -p 


1 logw^^ 


logn 


Uln 


-y/nlog 


n, 


log AT = 0, 


and 


Perr ^ |D||- D||Qg") | ^ 

Proof: The result follows with a modification of the proof of Theorem]^ to exploit the property Pi<fiPQ. Let 
(5 > 0, M G N*, and an — Generate M codewords x* with i G |Il,M]] independently according to the product 

y/Tl 

distribution 11®"^. Upon receiving y, the decoder looks for symbols that belong to S. Let P(y) denote the positions 
of these symbols. Then, 

• if |P(y)| < (1 — S)nKan declare that T = 0; 

• else, if there exists a unique i G |Il,M]] such that codeword x* has xi-symbols for all positions in P(y), 
declare P = 1 and output message W = i\ 

• otherwise, declare an error. 
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a) Channel reliability analysis: By construction, the decoder makes an error if any of the following events 
occur. 

• The codeword Xj is transmitted hut there are fewer than (1 — 6)nKan xi-symhols in 'P(y). 

• The codeword Xj is transmitted hut there are multiple codewords with xi-symhols for all positions in V{y). 

• No communication takes place hut there are more than (1 — 5)nnan xi-symhols in 'P(y). 

By definition of S, note that |'P(y)| = 0 if no communication takes place. Consequently, the prohahility of error 
averaged over the random codehook generation satisfies 
/ M 

E(Perr) ^ E EE — VCypf (y|Xj)l{|T’(y)| < (1 — S)nKan or 3j / i sueh that VA: G 5 Xj^k = Xi^k} 

V y i=i 

(175) 






+EME lCy"j^(y|Xi)l{|P(y)| ^ (1 - 6)nKan and \/k £ S Xj^k = 

jVI V y / 


(176) 


= Fp..{\V{Y)\ < {l- 6 )nKan) 

x)n®”(x')lT'y”^(y|x)l{|P(y)| ^ (1 - 5)nKan and VA; G 5 = Xk} 

j^l X X' y 

(177) 

^Pp-(|P(Y)| < {l-6)nKan ) +J]]^^n®”(x)iy®]^^(y|x)l{|P(y)| ^ {I - 5)nKan}a\^^^^\ 

j^l X y 

(178) 

^ (|P(Y) I < (1 - 5)nKan) + . (179) 

Sinee |'P(Y)| = ^ ‘^1 Ep».i (|'P(Y)|) = annn = Uns/nn, a Chernoff hound guarantees that 

Pp» (|iP(Y)| < (1 - 6)annK) ^ (180) 

Henee, for any /r g] 0; 1[, ehoosing 

logM = (1 - /i)(l - Si)n{\ + ^ Wnv/nlogn (181) 

V2 logn J 

ensures that 

E(Perr) ^ for some appropriate choice of pi > 0. (182) 


b) Channel resolvability analysis: Lemma still applies and one may pursue the same analysis as in the proof 
of Theorem In fact, the choice of log M in ( |181| l is overwhelmingly larger than the minimum required to ensure 

E(D(Q"^||Q®”)) ^ Q-p 2 ‘^n,Vri fQj. some appropriate choice of p 2 > 0. (183) 


Note that this may he achieved without using any secret key. The final steps of the proof are identical to those in 
the proof of Theorem ■ 

We may also identify the corresponding asymptotic scaling constant of log M. 


Corollary 4. Consider a discrete memoryless covert communication channel with Pi<it.Po, Qi <C. Qo, and Qi ^ Qo- 
Let K be defined as per (174) and Un G o(l) H uj{^) as n ^ oo. Then, for any ^ g]0; 1[, there exist keyless covert 
communication schemes such that 


limDfQ’^IIQn =0, 

1^00 ^ ' 

logM 


lim 

n^oo 


nD((5"'||(5g’^) logn 



1 loga;„^ 

- + hm , "• 

2 n^oo log n 
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Proof: Follows from steps identical to the proof of Corollary [T] ■ 

Notice that the optimal scaling constant depends on the exact choice of Wn £ o(l) n This differs from 

l—I V ^ 

the situation of Corollary^ in which the scaling constant remains the same for all choices of a;„ G o(l) na;(;^). 
This coding scheme turns out to he optimal, as one may establish the following converse result. 

Theorem 8. Consider a discrete memoryless covert communication channel with Pi<it.PQ, Qi <C Qo, and Qi 7 ^ Qo- 
Consider a sequence of covert communication schemes with increasing blocklength n characterized by Cn — Perr <^nd 
5n = D(Q”'||Qo"')- M = 00 and lim^^oo Cn = = 0, there exists Wn G o(l) n as 

n ^ 00 such that 


lim 

n^oo 


logM 




Xo^ill Qo) 


1 logro ^ 
- + hm —f—— 

2 n->-oo log n 


'nD(Q"'||Qg"') logn 

Proof: The converse proof technique of Theorem applies hut we cannot rely on Lemma to hound I(X; Y) 
with X ~ since Pi^fiP^. We use instead the following hound. 

I{X;Y) = {1 - ^in)D{Po\\Pf,J + finO{Pi\\PuJ (184) 


(1-pn) ^ Po{y)'^ogy^^ + Hn ^ Pi (y) log log 


y&y\s 


y(^y\s 


PuM 


1 


^log:p^- ^ bn ^ Pi(y)log|^^ +/i„Klog//„\ 


Po{y) 


(185) 


(186) 


where the last inequality follows because Pf^^iy) ^ (1 — iJ,n)Po{y)- Since lim„_>.oo \/nfin = 0 for the same reason 
as in the proof of Theorem we may write ^ with Wn = o(l). If lim„^oo logM = 00 then (87 1 and (186 1 


impose that 


lim n/inlogp„^ = lim s/nwn ( -logn + logTO„" | = 00 . 


(187) 


Assume that vOn G 0{ Then, there exists 0 < A < cx) such that, for all n large enough, Wn ^ ^fogr 

Since x 1 —)> xlog - is increasing for x G [0,1/e], we must have for all n large enough 


lim s/nwn ( ^ log n + log ^ 


A A f 1 

^-h lim -- ( - log n + log log n] = A < 00 . 

2 n->-oo logn \2 


This contradicts (187i, therefore Wn G ^i^gn )- Consequently, 
logM ^ nI(X; y) + Hfe (en) 


lim 

n—>-oo 


nD(Q"^||(5o"') logn 


^ lim ,_ 

log n 


(188) 


-log(l-/2n)+MnEyGy\sPl(h)^Og^^ + ^Mb(en) 

sJlplx^iQtW Qo) (1 - s/lPi) logn 

_ log_^ _ log(l-Mr.) _|_]_ ■sp p Pljy) I 1 (f '1 

logn /i„logn ' logn Xiy£y\S IvVl & Po(y) ^ nfi^logn b \ it-J 


= lim 

n—^csD 


= Ka 


x,(Qi|| Qo) 


—h lim 
2 n-KX) log n 


2^2(^111 ^0) (1 \/ hn ) 

logwf^' 


(190) 

(191) 


Note that Corollary and Theorem differ in the choice of scaling for Un and Wn- Flowever, note that for 
vJn G 0 ( 1 ) n ‘^( ^|og„ )> we have for all n large enough 


log w. 


log n 


^ log{y/nlogn) 


log n 


1 log log n 

2 logn ’ 


(192) 
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SO that lim„_>.oo ^ By choosing Un = for any e g] 0; 1/2[ in Corollary|4 

e, which can be made arbitrary close to f. In that regard, the converse is asymptotica 
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